在现实生活中,机器学习模型经常面临培训和测试域之间存在数据分布的变化的情景。当目标是对不同于在培训中看到的分布的预测,我们会产生域泛化问题。解决此问题的方法使用来自多个源域的数据来学习模型,然后将此模型应用于未经调整的目标域。我们的假设是,当用多个域训练时,每个迷你批处理中的冲突梯度包含特定于与其他域的各个域特定的信息,包括测试域。如果保持不受影响,这种分歧可能会降低泛化性能。在这项工作中,我们在域移情中出现的突出梯度,并根据梯度手术制定新的渐变协议策略,以减轻其效果。我们在具有三个多域数据集中的图像分类任务中验证了我们的方法,显示了提高域移位情景中深入学习模型的泛化能力的拟议协议策略的价值。
translated by 谷歌翻译
椭圆测量技术允许测量材料的极化信息,需要具有不同灯和传感器配置的光学组件的精确旋转。这会导致繁琐的捕获设备,在实验室条件下仔细校准,并且在很长的获取时间,通常按照每个物体几天的顺序。最近的技术允许捕获偏振偏光的反射率信息,但仅限于单个视图,或涵盖所有视图方向,但仅限于单个均匀材料制成的球形对象。我们提出了稀疏椭圆测量法,这是一种便携式偏光获取方法,同时同时捕获极化SVBRDF和3D形状。我们的手持设备由现成的固定光学组件组成。每个物体的总收购时间在二十分钟之间变化,而不是天数。我们开发了一个完整的极化SVBRDF模型,其中包括分散和镜面成分以及单个散射,并通过生成模型来设计一种新型的极化逆渲染算法,并通过数据增强镜面反射样品的数据增强。我们的结果表明,与现实世界对象捕获的极化BRDF的最新基础数据集有很强的一致性。
translated by 谷歌翻译
根据研究人员在歧视和校准性能方面采用的标准评估实践,这项工作旨在了解阶级不平衡对胸部X射线分类器的性能的影响。首先,我们进行了一项文献研究,分析了普通科学实践并确认:(1)即使在处理高度不平衡的数据集时,社区也倾向于使用由大多数阶级主导的指标; (2)包括包括胸部X射线分类器的校准研究仍然罕见,尽管其在医疗保健的背景下的重要性。其次,我们对两个主要胸部X射线数据集进行了系统实验,探讨了不同类别比率下的几种性能指标的行为,并显示了广泛采用的指标可以隐藏少数阶级中的性能。最后,我们提出了通过两个替代度量,精密召回曲线和平衡的Brier得分,这更好地反映了系统在这种情况下的性能。我们的研究结果表明,胸部X射线分类器研究界采用的当前评估实践可能无法反映真实临床情景中计算机辅助诊断系统的性能,并建议改善这种情况的替代方案。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
Making histopathology image classifiers robust to a wide range of real-world variability is a challenging task. Here, we describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) to address the problem of generalization for mitosis detection in images of hematoxylin-eosin-stained histology slides under high variability (scanner, tissue type and species variability). Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation with a training set enriched with hard negative examples and automatically selected negative examples from the unlabeled part of the challenge dataset. To optimize the performance of our models, we investigated a hard negative mining regime search procedure that lead us to train our best model using a subset of image patches representing 19.6% of our training partition of the challenge dataset. Our candidate model ensemble achieved a F1-score of .697 on the final test set after automated evaluation on the challenge platform, achieving the third best overall score in the MIDOG 2022 Challenge.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.
translated by 谷歌翻译
Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译